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ABSTRACT 

We consider what is the best way to extract science from large surveys of the Milky 
Way galaxy. The diversity of data gathered in these surveys, together with our position 
within the Galaxy, imply that science must be extracted by fitting dynamical models 
to the data in the space of the observables. Models based on orbital tori promise to 
be superior for this task than traditional types of models, such as N-body models 
and Schwarzschild models. A formalism that allows such models to be fitted to data 
is developed and tested on pseudodata of varying richness drawn from axisymmetric 
disc models. 

Key words: galaxies: evolution - galaxies: kinematics and dynamics - Galaxy: disc 
- solar neighbourhood - methods: data analysis 



1 INTRODUCTION 

A major thread of current research is work directed at under- 
standing the origin of galaxies. There are excellent prospects 
of achieving this goal by combining endeavours in three dis- 
tinct areas: observations of galaxy formation taking place 
at high redshift, numerical simulations of the gravitational 
aggregation of dark matter and baryons, and studies of the 
Milky Way. The latter field is dominated by a series of major 
observational programs that started fifteen years ago with 
ESA's Hipparcos mission, which returned parallaxes and 
proper motions for ~ 10 stars (Perryman 1997). Hippar- 
cos established a more secure astrometric reference frame, 
and the UCAC-3 catalogue (Zacharias et al. 2010) uses 
this frame to give proper motions for several million stars. 
These enhancements of our astrometric database have been 
matched by the release of major photometric catalogues DE- 
NIS (Epchtein et al. 2005), 2MASS (Skrutskie et al. 2006), 
SDSS (Abazajian 2009), SEGUE (Yanny et al. 2009), and 
the accumulation of enormous numbers of stellar spectra, 
starting with the Geneva-Copenhagen survey (Nordstrom 
et al. 2004; Holmberg et al. 2007, hereafter GCS) and con- 
tinuing with the SDSS, SEGUE and RAVE (Steinmetz et al. 
2006; Siebert et al. 2011) surveys. Major programs to obtain 
low-dispersion stellar spectra are currently underway - on 
completion the SEGUE and RAVE surveys will each provide 
~5x 10 5 spectra. These spectra yield good line-of-sight ve- 



locities and estimates of 
coarse estimate of [a/Fe] 



[Fe/H] errors of ~ 0.1 dex and a 
Three surveys (APOGEE, ESO- 
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Gaia and HERMES) are currently being being prepared that 
will obtain large numbers of spectra with resolutions in the 
range 20 000 — 40 000 from which abundances of significant 
numbers of elements can be determined. Several important 
photometric surveys are currently under way, including the 
PanSTARRS survey (Kaiser et al. 2002) which is obtain- 
ing griz photometry through a large part of the sky, and 
surveys of the bulge region and the Galactic plane in the 
near-IR with the VISTA telescope. The era of great Galac- 
tic surveys will culminate in ESA's Gaia mission, which is 
scheduled for launch in early 2013 and aims to return photo- 
metric and astrometric data for 10 9 stars and low-dispersion 
spectra for ~ 10 8 stars. 

The Galaxy is an inherently complex object, and the 
task of interpreting observations is made yet more difficult 
by our location within it. Consequently, the ambitious goals 
that the community has set itself, of mapping the Galaxy's 
dark-matter content and unravelling how the Galaxy was 
assembled, can probably only be attained by mapping ob- 
servational data onto sophisticated models. This paper is the 
first in a series in which we use a new method of analysing 
dynamical models to interpret data from surveys of different 
types. Here we introduce and test the basic principles using 
mock survey catalogues of the region b > 30° that contain 
data of Gaia-like quality that progresses in completeness 
from photometry plus proper motions, through photome- 
try, proper motions, parallaxes and line-of-sight velocities. 
In subsequent papers we will extend the formalism to include 
spectrophotometric data and apply it to real catalogues. 

The paper is organised as follows. Section 2 explains the 
fundamental importance of equilibrium dynamical models 
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for the problem in hand. Section 3 discusses the feasibility 
of using N-body models to interpret surveys of the Milky 
Way and outlines our preferred strategy. Section 4 presents 
the formulae used to calculate the likelihood of a catalogue 
given a model, while in Section 5 we explain how we find the 
optimum values of the model's parameters and their statis- 
tical uncertainties. Section 6 describes how we construct a 
pseudo-catalogue from a model and a number of tests of our 
method that we have run using a Gaia-like pseudo-catalogue. 
Section 7 outlines a number of directions for further work 
and future developments of our methodology. Section 8 sums 
up. 



2 WHY WE NEED GALAXY MODELS 

Models of the Galaxy have a key role to play for several 
reasons. First they enable one to understand and compen- 
sate for observational biases, which dominate all data sets 
on account of our location in the Galaxy's dust- and gas- 
rich disc. Second they provide the natural means of tying 
together data from different surveys - surveys may concen- 
trate on obtaining photometric, astrometric or spectral data 
and different surveys probe magnitude ranges and therefore 
constrain the Galaxy over complementary distance ranges. 
A model can assemble this complementary information into 
a single coherent picture. 

In the 1980s Bahcall and his collaborators moved Galac- 
tic astronomy an important step forward by introduc- 
ing models of the stellar content of the Milky Way that 
were inspired by observations of external galaxies (Bahcall 
& Soneira 1984; Ratnatunga et al. 1989, and references 
therein). Although these models included the kinematics 
of stars, velocities were not required to be consistent with 
Newton's laws of motion. In the Besancon model (Robin 
et al. 2003), Newton's laws are used to connect the verti- 
cal structure of the disc to the distribution of vertical ve- 
locities, W, but the large-scale structure of the Besangon 
model of the Galaxy is not constrained by Newton's laws. 
The most recent incarnation of this type of model is pre- 
sented by Sharma et al. (2011). 

Fully dynamical models, that is models in which the dis- 
tribution of stars is consistent with Newton's laws of motion, 
are now required for several reasons. Most obviously, the dis- 
tribution of dark matter can be deduced from observations 
of stars and gas only by assuming that the Galaxy is in an 
approximate steady state, and interpreting observations in 
the framework of an equilibrium dynamical model - if we do 
not assume a steady state (which strictly speaking must be 
a false assumption) , any mass distribution is consistent with 
any phase-space distribution of the baryonic matter. Infer- 
ences about the mass content can only be drawn because a 
sufficiently large central concentration of mass would imply a 
rapid collapse of the baryonic component from its observed 
configuration, and a sufficiently small mass concentration 
would imply that the baryonic component would fly apart. 
We must deduce the actual mass distribution by assuming 
that the baryonic component is in an approximate steady 
state, thus ruling out large-scale contraction or expansion. 
Hence equilibrium dynamical models are fundamental for 
achieving a major goal of contemporary astronomy. 

An attractive feature of equilibrium dynamical models 



is that they reduce the Galaxy from a six-dimensional to 
a three-dimensional object: without the assumption of dy- 
namical equilibrium, we have to specify the distribution of 
stars in six-dimensional phase space, while the assumption 
of equilibrium together with Jeans' theorem makes it suffi- 
cient to specify the distribution of stars in three-dimensional 
space of isolating integrals. For obvious reasons, objects can 
be imagined in three dimensions much more easily than they 
can be imagined in four or higher dimensions, so the reduc- 
tion in dimensionality from six to three is a major simpli- 
fication. This reduction also vastly reduces the amount of 
information required to specify a model: if we measure each 
position and velocity with a precision of, say, 1 percent, a six- 
dimensional model with less than 10 resolution elements 
will degrade the resolution of the raw data, while a three- 
dimensional model requires only 10 6 resolution elements for 
a faithful representation of the data. 

A dynamical model connects objects that we can ob- 
serve to objects that we have not observed, either because 
they are distant and therefore faint, or because they are ob- 
scured by dust. For example May & Binney (1986) showed 
that if the stellar halo were in virial equilibrium, more than 
half the stars of the stellar halo would be on orbits that 
bring them through the solar neighbourhood, so in principle 
from measurements made in a small volume around the Sun 
we could determine the density of stars on more than half 
the halo's populated orbits. 

The Galaxy is certainly not in a steady state. Most 
obviously because it has a bar inside ~ 3kpc and spiral 
arms in the surrounding disc. These features not only rotate 
with various pattern speeds, but on longer timescales change 
their structure and may well decay. Moreover, by scattering 
stars and dark-matter particles they drive secular evolution 
of all the Galaxy's components. The resulting secular evo- 
lution of the disc has been studied for half a century (e.g. 
Roman 1954; Spitzer & Schwarzschild 1953; Carlberg & Sell- 
wood 1985; Binney & Lacey 1988; Dehnen & Binney 1998; 
Schonrich &: Binney 2009) and has clearly played a major 
role in determining the observed state of the Galaxy. Secular 
evolution is most readily modelled by adding perturbations 
to the Hamiltonian of an equilibrium model, that cause the 
model to move through a series of equilibrium states. Hence 
equilibrium models are the key to modelling secular evolu- 
tion as well as to determining the distribution of dark mat- 
ter. 

Data from the SDSS survey revealed that a significant 
proportion of the stellar halo has yet to phase mix prop- 
erly, and that it contains numerous tidal streams (Bell et al. 
2008). These streams have considerable potential for map- 
ping the Galaxy's gravitational field and distribution of dark 
matter. Galaxy models of the type we advocate in this pa- 
per would seem to offer the best hope of fully exploiting the 
potential of tidal streams (Eyre & Binney 2011). 

The complexity of the Galaxy is such that we cannot re- 
alistically hope eventually to build a dynamical model that 
accounts for every observation in detail. For example, it is 
known that stellar discs are highly responsive objects, and 
much of the Galaxy's current spiral structure will represent 
an ephemeral response to noise driven by star formation 
and fluctuations in the density of the dark-matter halo. The 
Galaxy's warp is likely to be driven by such density fluctua- 
tions. In these circumstances we should aim for a sequence of 
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dynamical models of increasing complexity and realism. We 
aim first for an axisymmetric, equilibrium model and iden- 
tify the most prominent features in the data that cannot be 
explained by such a model. Then we aim for a steady-state 
barred model and see how much more of the data such a 
model can explain. Next we include either the warp or spi- 
ral structure by perturbing our best barred model, and see 
how much better we can fit the data. By proceeding in this 
way we can hope to reach the point at which we feel we 
have a model that provides as good a fit to the data as it is 
reasonable to expect, and we will have learnt along the way 
a great deal about the Galaxy's contents and manner of op- 
eration. In this paper we are concerned with the first stage 
in this journey, the construction of axisymmetric models. 



3 TYPES OF GALAXY MODEL 
3.1 N-body models 

The simplest galaxy models to construct are N-body mod- 
els. The initial conditions from which an N-body model is 
integrated are generally not consistent with an equilibrium 
model because such initial conditions can only be chosen 
if one is already in possession of the desired model. There- 
fore the equations of motion must be integrated for some 
time to allow the model to settle to an equilibrium through 
relaxation. This period of relaxation has two unfortunate 
consequences. First, equilibrium is reached only asymptot- 
ically in time, and in the case of a disc galaxy significant 
relaxation may continue for an inconveniently long time. 
Second, it is not clear what initial conditions are required 
to achieve a given configuration. Recently the "made-to- 
measure" (M2M) technique introduced by Syer & Tremainc 
(1996) has been sharpened by de Lorenzi et al. (2007), 
Dehnen (2009) and others into an effective way to refash- 
ion a model that somewhat resembles the target galaxy into 
a model that represents that galaxy to good accuracy. Con- 
sequently the dynamical models that currently best approx- 
imate the Galaxy are M2M models (Bissantz et al. 2004; 
Martinez- Valpuesta & Gerhard 2011). 

While M2M models can be extremely useful, they are 
less than ideal from a number of perspectives. First, an M2M 
model is specified by N phase-space coordinates and particle 
weights, where N > 10 5 is a large number. This specification 
is at once cumbersome and non-unique; at a later time the 
coordinates will be different while the model will be the 
same. An equally good model of the same galaxy made by 
a different group will use a different mix of orbits so the 
weights will be different and it will not be evident that the 
two models are equivalent. Second, when the model's phase 
space has significant stochastic regions, it is not possible 
to halt dynamical evolution of the model during the period 
that orbits are followed in order to optimise their weights. 
Finally, there is a fundamental problem with any particle- 
based model of the Galaxy that was pointed out by Brown 
et al (2005): important information about the Galaxy is con- 
tained in observations of low-luminosity stars that can only 
be observed close to the Sun, so such stars must be rep- 
resented in any complete Galaxy model. However, orbiting 
particles that are at one moment near the Sun are some time 
later far from the Sun. If particles are treated as stars, those 



that represent low-luminosity stars will rarely contribute to 
observables because the particle will usually to be too far 
from the Sun to be visible. Consequently, the number of 
particles contributing to observables will be much smaller 
than the number of particles in the model. One might hope 
to circumvent this problem by considering particles to be 
representatives of a stellar population that has a well de- 
fined luminosity function. Then when a particle is far from 
the Sun, one could be sure to draw a representative star lu- 
minous enough to be visible from the Sun. But then for con- 
sistency many stars must be drawn from the particle when 
it is near the Sun. Clearly the model's shot noise will be 
adversely affected if each nearby particle contributes to the 
observables a host of stars with exactly the same phase-space 
coordinates, just as it will be if the number of stars is much 
less than the number of particles. 

To escape from this sampling problem we need to be 
able to sample the solar neighbourhood much more densely 
than far-flung parts of the Galaxy. That is, our model must 
yield the probability density of particles rather than a spe- 
cific realisation of this density - we need to know the phase- 
space density / of stars, for armed with / we can populate 
the solar neighbourhood with large numbers of mostly low- 
luminosity stars and remote parts of the Galaxy with smaller 
numbers of exclusively luminous stars. 

In principle it is possible to determine the phase-space 
density / from an N-body model because Liouville's theo- 
rem assures us that phase-space density is constant along 
orbits, so the phase-space density is known at the location 
of every particle provided the initial conditions sampled a 
well defined sampling density / s (e.g. §4.7.1 of Binney & 
Tremaine 2008). Hence the phase-space density at any point 
in the phase space of the final model can be estimated by six- 
dimensional interpolation. Sampling the phase-space density 
obtained in this way is not easy, but is achievable with the 
Metropolis algorithm for example (e.g. Binney et al. 1992, 
§4.3). However, these steps are highly non-trivial and to the 
best of our knowledge no N-body model has been success- 
fully resampled. 

The problem of determining the distribution function 
(df) / is made more challenging by the fact that we need 
not one df but a df f a for each stellar population a: as 
a minimum the model must predict the spatial distribution 
and kinematics of stars that lie in each of several regions of 
the ([Fe/H], [a/Fe]) plane (e.g. Schonrich & Binney 2009). It 
would not be straightforward to adapt an N-body model so 
that it yielded several dfs simultaneously, although ab initio 
models of galaxy formation can assign stellar parameters to 
particles (e.g. Simon et al. 2010, and references therein). 

3.2 Torus models 

The modelling technique introduced by Schwarzschild 
(1979) has become the standard tool for analysing the dy- 
namics of early-type galaxies (e.g Krajnovic et al. 2005), es- 
pecially in connection with searches for central black holes 
(e.g. Gebhardt et al. 2003). This technique involves first in- 
tegrating a number of orbits in the adopted gravitational 
potential and then seeking weights for these orbits such that 
the weighted sum of the orbits reproduces the observational 
data to acceptable precision. 

Torus modelling is analogous to Schwarzschild mod- 
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elling except that orbits, which are essentially time series of 
phase-space points, are replaced by orbital tori, which are 
analytic expressions for the three-dimensional continuum of 
phase-space points that are accessible to a star on a given 
orbit. Whereas an orbit is labelled by its initial conditions, 
a torus is labelled by its three action integrals Ji\ whereas 
position on an orbit is determined by the time t elapsed 
since the initial condition, position on a torus is determined 
by the values of three angle variables 9i, one canonically 
conjugate to each action Ji. For a list of advantages aris- 
ing from the replacement of orbits by tori, see Binney & 
McMillan (2011), and for a summary of how orbital tori are 
constructed and references to the papers in which torus dy- 
namics was developed, see McMillan &: Binney (2008). Note 
that torus models provide the ideal framework within which 
to study features such as spiral arms or warps because they 
come with angle-action variables; these coordinates were in- 
vented for perturbative studies of the solar system and, as 
Kaasalainen (1994) showed, perform spectacularly well when 
they have been derived by constructing tori. 

Although it is possible to treat the weights of tori as 
independent unknowns to be fitted to the data just as in 
Schwarzschild modelling, a better procedure is to derive the 
weights from the model's df. By Jeans' theorem, the df of 
a steady-state model can be taken to be a function /(J) of 
the action integrals. Following Binney (2010; hereafter B10) 
we make / an analytic function of certain parameters Oj in 
addition to J, and we fit the model to the data by varying 
the cii. 

The impact of shot noise on the model is minimised if 
all tori have the same weight, and this will be the case if 
the density of used tori in action space samples the df. We 
endeavour to ensure that this condition is met, at least to a 
good approximation. 



4 ASSESSING A MODEL 

Our modelling strategy is as follows. For each trial gravita- 
tional potential we construct a library of orbital tori that 
has a known sampling density in action space. For each of 
the Galaxy's identified stellar populations (such as G-dwarfs 
or K-giants of given metallicity) we have a luminosity func- 
tion F(M) and a trial df with parameters that have to be 
chosen for that population. For given values of these pa- 
rameters, we can infer the weight of each of the library's 
tori from the df. Given these weights and the numerically- 
determined mapping from actions and angles to Cartesian 
phase-space variables, we can find the probability of finding 
a star of a given absolute magnitude at any point (x, v) in 
phase space, and therefore at any point in the space of ob- 
servables. Hence for given values of the parameters we can 
evaluate the likelihood of the data given the model that is 
defined by the current gravitational potential and the cur- 
rent parameters. We find the values of the parameters that 
maximise the likelihood for the given potential, and then 
repeat the process for a different potential. In this way we 
determine what range of gravitational potentials and dfs is 
consistent with the data. 

The Galaxy's gravitational potential $ is generated by 
the sum of the mass densities of all the Galaxy's popula- 
tions, including that of its dark-matter particles. Our knowl- 



edge of <J> remains quite limited, so in the coming years the 
challenge is to use the kinematics of stars and gas to con- 
strain $ more tightly. Then Poisson's equation can be used 
to derive the dark-matter density as the difference between 
the density that generates $ and the density of stars and gas. 
Once the distribution of dark matter is fairly well known, 
it will be appropriate to seek a df for the dark matter and 
construct a completely self-consistent model of the Galaxy. 
At the present stage of our understanding a concern for self- 
consistency would be premature because many dfs for dark 
matter will be consistent with any plausible density distri- 
bution of dark matter. Consequently, nothing is to be gained 
by specifying the dark matter's df until there are kinematic 
constraints on it, presumably provided by experiments that 
detect dark-matter particles underground. 

In this paper our focus is on the determination of the 
df of stars given typical observational data and a trial form 
for $. We defer to a subsequent paper a study of the effects 
of changing the assumed form of $ and thus the extent to 
which given data allow one to constrain <!>. 

4.1 Basic notation 

We now lay out the formulae that are required to implement 
the strategy just described. The data consist of a catalogue 
that for TV stars gives accurate values of the Galactic coor- 
dinates (b, I), data such as apparent magnitude m, colour 
V — I, and line-of-sight velocity v\ oe that have moderate er- 
rors, and values of the parallax zu, proper motion fi, surface 
gravity g and metallicity Z that are probably significantly 
in error. We group the variables into two sets, the basic vari- 
ables 



u = (6, l,m, w, H,v\os) 

and additional astrophysical variables 

B =(V-I,g,Z). 



(1) 



(2) 



Note that u has seven components, effectively a star's phase- 
space coordinates (x, v) and its apparent magnitude m. For 
now we neglect interstellar extinction. Then the star's abso- 
lute magnitude M is effectively specified by u because the 
star's distance is fixed by x. 

We assume that the errors in the observed quantities are 
independent and can be modelled by Gaussian probability 
distributions 

-(u-u) 2 /2a 2 



G(u, u, a) = 



(3) 



V2na 2 

We attach primes to the true values of measured quantities 
to distinguish them from the measured values, so we gener- 
ically have that the probability of measuring a value u is 



(4) 



Pr D (u) = J du G(u, u , a) Pr mode i(w'), 

Any quantity such as the parallax that is not given in the 
catalogue can be considered to have a sufficiently large a 
that the Gaussian density is effectively constant for all rel- 
evant values of the variable. 

For brevity we use the notation 



G-(u, u',er) = G(u k , u' k , a k ). 



(5) 
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In this paper we restrict ourselves to the case of a sin- 
gle stellar population. This assumption ensures that there 
are no correlations between stellar type and kinematics: the 
distribution of stars in phase space is independent of their 
luminosities, colours, metallicities, etc. In this case we can 
confine discussion to the components of u and neglect s. 
We assume that the luminosity function F(M) satisfies the 
normalisation condition 



1 



J — C 



dMF(M) 



(6) 



(7) 



and that the DF /(J) is normalised such that 
1= [ d 3 Jd 3 0/(J). 

Jail 3,6 

Note that (3,6) determines (x,v), so the set (M, 3,6) fixes 
the true values of a star's observables, u'. 

We require the probability of observing a star with ob- 
servables in d 7 u' given that the star satisfies the selection 
criteria of the survey. If dP(u') is the probability that a star 
chosen randomly from the Galaxy as a whole lies in d 7 u' 
around u', then 



dP(u') = dP(u'jin survey)P(in survey). 
But 

dP(u') = /(J)F(M)^^dV, 



(8) 



(9) 



P(u'|in survey) d 7 u' 



f(3)F(M) d(M,J,6) d r u , 
P(in survey) <9(u') 



To take into account observational errors, we fold the prob- 
ability distribution (10) with the Gaussian kernel and have 
that the probability density of stars in the space of observ- 
ables that is predicted by the DF / is 

Pr (u|/)= J dV Gl(u,u',a)P(u'\m survey) 

/ d 3 J /(J) / d 3 / AM F(M)G(u, u', a) 



P(in survey) 



(11) 



where u'(M, 3,6) is the vector of observables corresponding 
to the given absolute magnitude and phase-space position. 

Since Pr (u|/) is a properly normalised probability den- 
sity function (pdf), we have 



(12) 



P(in survey) = J d 3 J f(3) J d 3 6 J AM F(M) 

x / d 7 uG(u,u',er) 

Js 

where the subscript S on the second integral implies inte- 
gration through the range of observables encompassed by 
the survey. Usually, there is no restriction on the velocities, 
so the Gaussian factors in nt,, etc, integrate out to unity. 
Similarly the parallax will not be restricted a priori, so its 
Gaussian factor will integrate to unity. We assume that the 
Gaussian factors in b and I integrate to either zero or unity, 
depending on whether the true values b' and I' lie in the 
surveyed region because the observational errors in the sky 
coordinates are so small. Hence only the integral over ap- 
parent magnitude m need be considered, and in this pa- 
per we make the assumption that the errors in m are small 



compared to the width of the luminosity function, so we as- 
sume that the remaining integral is zero or unity depending 
on whether the true apparent magnitude m! lies within the 
survey limits. Thus we adopt 



P(in survey) = J d 3 J /(J) J A 3 6 J 



AMF(M), 



(13) 



where r(3, 6) is the heliocentric distance of the specified 
point in phase space and 



M cr it (r) = mum - 51og 10 (r/10pc) 



(14) 



with mii m the limiting apparent magnitude of the survey. 
We define the survey's selection function to be 

I- M crlt (r) 

AMF(M), (15) 



0(J) = / d 3 

Js 

Then we have 
P(in survey) 



J/(J)0(J), 



and (11) can be written 

/ d 3 J /(J) J s A 3 6 J AM F(M)G(u, u', a) 



Pro(u|/) 



Jd3J/(J)0(J) 



(16) 



(17) 



From these individual probabilities we construct the log like- 
lihood of a survey for a given model: 



L^^ln pr„(ua|/)], 



(18) 



where the sum is over the stars in the survey. In an appendix 
we demonstrate explicitly that L is stationary when the trial 
DF used to evaluate it is equal to the DF that was used to 
produce the catalogue being analysed. 

The integrals over J in equation (17) are most conve- 
niently done by Monte-Carlo integration. We arrange that 
the points at which the integrand is evaluated sample the 
DF, so we may replace J d 3 J /(J) by iV -1 ^ fe : 



Pro(u|/) = 



f s d 3 efdMP(M)G 7 (u,u' fc ,<r) 



(19) 



where u' k (M,3 k ,6). 

The evaluation of the integral in equation (19) can be 
simplified by taking advantage of the fact that the errors in 
(b, I) are small so the integrand is non-negligible only when 
(6', I') lies close to (6, 1). We can approximate the Gaussians 
in these variables as ^-functions and integrate them out an- 
alytically: doing so we introduce a Jacobian because 



/ 



d 3 6> 6(b-b')S(l-l') 



I 
I 



Ab'Al'dvj' 
Avj' 



0(6) 



d(b',l',uj') 
8(6) 



5(b — b')S(l — l') 



d(b,l,vj') 



(20) 



The evaluation of the Jacobian is described in the Appendix 
of Binney & McMillan (2011). Now we have 



ProH/) = 



9(6) 



d(b,l,r') 



J AMF(M)Gl(u,u' k ,<T) 



-,(21) 
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where the integration over the true parallax zu' — 1/r' has 
been transformed into an integration over true heliocentric 
distance. The integral over M simply returns the fraction of 
the stellar population that is consistent with the measured 
apparent magnitude at distance r'. We use equation (21) for 
Pr (u|/) in equation (18) when evaluating the likelihood. 

We determine 0(J) as follows. For each value of J we 
choose at random a point Oi in angle space. The point cor- 
responds to a distance r and sky coordinates (6, 1). If these 
coordinates lie in the survey volume, stars with brighter ab- 
solute magnitudes than the value M cr it(f) from equation 
(14) will enter the catalogue. After a large number N of 
values of have been explored, we have 

■> N rM CTit (n) 
^~nY, dMF(M). (22) 

Since <f> does not depend on /, it needs to be evaluated only 
at the start of the procedure for optimising /. 

The survey contains no information about the value 
of /(J) at actions for which 0(J) = 0, so it is immaterial 
whether we change the value of / at these actions. When 
we adopt a functional form for /, we are in fact varying / 
in these invisible regions, and thus at the end of the process 
arrive at a prediction for that can be tested by a deeper or 
wider survey. 

4.2 Binning the data 

The computational cost of evaluating the likelihood of a cat- 
alogue scales as the product of the number of stars in the 
catalogue and the number of tori used to sample the model. 
Even for a catalogue that contains ~ 10 J stars, the cost is 
high. We now investigate the scope for reducing the cost by 
binning the data. 

The log likelihood (18) is a sum of the contributions 
from individual stars. Since the contribution from an indi- 
vidual star is practically unchanged by shifting its point u 
in data space by a small amount within the core of its er- 
ror ellipsoid, the calculated likelihood of the catalogue will 
not change much if we group stars with data points that lie 
within a small cell in data space and attribute to all of them 
the contribution to L from the centre of the bin. A slightly 
refined version of this basic idea is as follows. 

We establish a Cartesian grid in data space, the cell 
spacing in most dimensions being of order half the typical 
observational uncertainty of the corresponding observable. If 
we applied this criterion to I and b, we would obtain an ab- 
surdly fine grid on the sky. Therefore, in I and b we take the 
separation between bins to be comparable to the changes 
in angle over which we expect the distribution of stars to 
change significantly - this might be as large as 10° in I and 
a degree or so in b at low latitudes and larger increments 
near the poles. Having established our grid, we assign stars 
to cells. Finally we evaluate Pr at the centre of each cell and 
form L by adding the logarithm of this value times the num- 
ber of stars assigned to that cell. The saving in computation 
from binning is clearly proportional to the number of stars 
assigned to each cell, which increases with the adopted bin 
sizes and decreases with the number of quantities actually 
observed. Hence binning is most attractive for the sparsest 
data set, namely measurements of (6, 1, m, fi). However, this 



is already a five-dimensional space. For a survey of a third of 
the sky, it might be possible to get by with 100 bins on the 
sky. A few tens of bins would be required for the apparent 
magnitudes, and for each component of fi ten bins might suf- 
fice. Thus a minimal grid will have > 10 5 bins, so even with 
the simplest conceivable data, binning first becomes advan- 
tageous when the number of stars in the catalogue exceeds 
a million. In the case of Gaia, the list of observables must 
be expanded to include at least zu, and the number of bins 
required to do justice to the proper-motion data must be in- 
creased by a factor of at least 10, implying a grid with > 10 6 
bins. In reality one would want to include colour data, and 
some linc-of-sight velocities, and the number of bins would 
be pushed up to ~ 10 9 , essentially the number of stars in 
the catalogue. 

Thus on account of the high dimensionality of the prob- 
lem posed by current and future surveys of the Galaxy, the 
prospects for binning reducing the computational cost of fit- 
ting models are not bright. 



5 OPTIMISING THE DF 

The formulae of the last section are used to evaluate the log 
likelihood L of a catalogue given a particular model. In this 
section we explain how we estimate the probability distri- 
butions of the parameters that appear in the DF. We do this 
with the Markov Chain Monte-Carlo (MCMC) algorithm: 
we choose some plausible set of parameters a for the DF and 
evaluate L for these parameters. Then we generate a random 
change 8a in the parameters and evaluate L for the DF with 
parameters a' = a + <5a. If the new value L' exceeds the old 
value L, we set a = a', while if L' < L we set a = a' with 
probability exp(L' — L). After a sufficient time, the resulting 
sequence of values of a sample the underlying PDF of a (e.g. 
§4.3 of Binney et al. 1992). 

Thus our approach to model fitting employs Monte- 
Carlo sampling in two distinct ways. To evaluate L for any 
given parameters a we evaluate the action-space integral by 
Monte-Carlo sampling action space, and to determine the 
PDF of the parameters of / we Monte-Carlo sample param- 
eter space. 

It is useful to keep a record of the individual contribu- 
tions to the sums over k in equation (21) for the following 
reason. Once the likelihood of the data given the current DF 
/ has been evaluated, a neighbouring DF, /, is chosen, and 
the likelihood is re-evaluated. The new value of any sum 
over k can be obtained by, in equation (21), making the 
replacement 

^^]T[/(J fc )//(J fc )]- (23) 

k k 

Consequently, if when we first evaluate the likelihood of the 
catalogue we record for each star the contribution that each 
torus makes to the probability of seeing that star, we can 
evaluate the likelihood of the catalogue for any other DF at 
speed. The price of this speed is having to record an array of 
size the number of stars in the catalogue times the number 
of tori employed. Fortunately, the array is somewhat sparse 
because a significant fraction of tori make a negligible con- 
tribution to the likelihood of a given star. The more precise 
the data are, the sparser the array becomes. 
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A key to computational efficiency on the first evaluation 
of the likelihood of the catalogue is to identify in advance for 
each star the limited number of tori that contribute signif- 
icantly to the integral in equation (21). To do so efficiently 
we create, for each torus, a grid in I, b and heliocentric r, and 
store the ranges of possible vi , v b and V\ os for each bin in the 
grid that the torus goes through (found to a good approx- 
imation by sampling at many 8 from the torus). With this 
information we can then, for each star, quickly discard tori 
which are clearly not relevant because the bins associated 
with the appropriate line of sight are either empty or have r 
and/or velocity ranges which are clearly incompatible with 
the observations. 

For a catalogue of significant size the extra computa- 
tional effort required to produce these grids is small com- 
pared to the computational effort saved by reducing the 
number of integrals that need to be performed because a 
grid only needs to be computed once per torus, rather than 
once per star per torus. In the examples discussed in Sec- 
tion 6 the grids reduce the number of line-of-sight integrals 
which need to be found by between a factor of ~10 when 
only fi is provided, and a factor of ~50 when /x, v\ os and w 
are provided. 

5.1 How many tori do we need? 

As just explained, estimates of L for many different param- 
eter values are obtained with the same set of representative 
tori, and the final optimum values of the parameter and their 
uncertainties are usually based on a single set of sampling 
tori. Unless this set is sufficiently rich, the parameter esti- 
mates are likely to be biased in the sense that their optimum 
values differ from the true values by more than the returned 
uncertainties: the latter reflect both the input observational 
errors and the statistical uncertainty arising from the finite 
number of stars in the catalogue we are fitting. They do not 
include statistical error associated with the use of a finite 
number of tori to evaluate integrals over action space. 

So, how many tori do we need to use? It is evident that 
at least one torus must yield a point u' in the space of ob- 
servables that lies within ~ la of the measured location of 
each star. Stars for which only one torus contributes points 
to the la error ellipsoid tend to bias the parameter values: in 
order to make such stars probable the computer has to make 
that torus probable; if this torus differs significantly from the 
star's actual torus, the probability associated with that star 
is being misplaced within action space, and we infer an in- 
correct df. For safety we require several tori to contribute 
points to the la error ellipsoid, for then the greatest weight 
can be assigned to whichever one of them lies close to the 
star's true torus. Clearly the higher the quality of the cata- 
logue, the smaller are the error ellipsoids and the more tori 
are needed to fulfil the criterion that several tori contribute 
to every la error ellipsoid. 

We can quantify this idea as follows. Equation (21) gives 
the probability of finding a star with observables u as a sum 
of contributions from individual tori. If we define 



fdr' 


9(0) 
d(b,l,r') 


J dMF(M)Gl(u,u' k ,<r) 


Pr D ( 


u|/)E fc ^(JfcO 



(24) 



then we have ^ fc pt = 1 and the contribution of the fcth 



torus to Pr (u|/) is proportional to pk- The Shannon en- 
tropy of the probability distribution {pk}, 

N 

S(u\f) = -J^Pk^Pk, (25) 
k 

is the natural measure of the extent to which the probability 
Pr (u|/) is contributed by a large number of terms in the 
sum or dominated by a single largest contribution: S van- 
ishes if there is one dominant contribution, so pk = 5fefe , 
and peaks at S = In N when p k — 1/N for all k; if there are 
just n <C N tori that might plausibly produce the data for 
a given star, n of the p k will be of order 1/n and the rest 
will vanish, so the entropy will be of order S ~ Inn. Hence 
e s is quite generally an estimate of the number of tori that 
are consistent with the star's data. Using too few tori to 
conduct the sum in equation (21) is signalled by some stars 
having values of e s smaller than unity/a few and results in 
the formal errors on the parameters being too small, with 
the result that the true parameters lie outside the ~ 1.5a 
error ellipsoid. 

While having e s ~ 1 suggests that too few tori are be- 
ing used, the following argument shows that a larger value 
of e s does not guarantee that enough tori are being used to 
give reliable results. Two small samples of tori drawn from 
different dfs can by chance have almost identical distribu- 
tions in action space. Suppose this distribution happens to 
fit the data perfectly. Then the two different parent dfs will 
appear to maximise the likelihood of the data, depending on 
which df the tori were in fact drawn from. The best way to 
check that enough tori are being used is to draw a new sam- 
ple of tori from the df that maximises the likelihood with 
the original sample of tori, and then to repeat the maximi- 
sation process using the new tori in the analysis. If enough 
tori were used, the new pdf of the df will differ negligibly 
from the old one. 



6 TESTS 

In this section we test the ability of our procedure to recover 
from a mock catalogue the df from which the catalogue was 
obtained. Hence we (i) build a dynamical model with known 
df, (ii) draw a sample of pseudodata from this model by "ob- 
serving" it from the location of the Sun and then add errors 
to the "observations" , and (iii) use the algorithm described 
in Section 5 to constrain the df. In this final step we assume 
the correct functional form for the df and enquire how ac- 
curately we can recover from the pseudodata the values of 
the df's parameters. We take the distance from the Galactic 
Centre to the Sun to be Ro = 8.5 kpc. 



6.1 The adopted luminosity function 

Since the velocity we infer for a star from its proper motion is 
proportional to the star's distance, whatever distance infor- 
mation we have is going to be crucial for the model-fitting 
process. For simplicity, we are assuming that the Galaxy 
contains a single stellar population, and we are not using 
colour information. Consequently, it suffices to specify the 
luminosity function. We use a simple polynomial approxi- 
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Table 1. The parameters of the DF of the model that contains 
both thin and thick discs 



Figure 1. Luminosity function described in Section 6.1, and used 
in all tests. 



mation to the general V-band luminosity function described 
in Binney & Merrifield (1998) Table 3.16: 

(-14.9 + 21M-5.4M 2 
F(M) oc I +0.59 M 3 - 0.019 M 4 for 1 < M < 19 (26) 
[ otherwise. 

This function is plotted in Figure 1. 



6.2 The adopted gravitational potential 

We use the Galactic potential which McMillan (2011) gives 
for a "convenient" Galaxy model. This axisymmetric model 
consists of a Galactic bulge, thin and thick exponential discs, 
and a Navarro, Frenk & White (1996) halo. The potential 
defines the Local Standard of Rest (LSR) and we assume 
that the Sun's velocity with respect to the LSR is that given 
by Schonrich, Binney & Dchncn (2010). 



6.3 The adopted distribution function 

We have run tests using two model Galaxies, both based on 
a "quasi-isothermal" DF (Binney & McMillan 2011) 



f(Jri L z , t7 z ) — fcr r { Jri L z 

where 



2%ai 



fa r {Jr, Lz) 



[1 + tanh(L z /L )]e~ 



(27) 



(28) 



Here Q(L Z ) is the circular frequency for angular momen- 
tum L z , k(L z ) is the radial epicycle frequency and v(L z ) is 
its vertical counterpart. E(L Z ) = T, e~ (R ~ Rc)/Rd is the (ap- 
proximate) radial surface- density profile and we set R& — 
3kpc, where R C (L Z ) is the radius of the circular orbit with 
angular momentum L z . The factor l + tanh(L z /Lo) in equa- 
tion (28) is there to effectively eliminate stars on counter- 
rotating orbits; the value of L is unimportant in this study 
provided it is small compared to the angular momentum of 
the Sun. In equations (27) and (28) the functions a z (L z ) and 
a r (L z ) control the vertical and radial velocity dispersions. 
The observed insensitivity to radius of the scale-heights of 
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extragalactic discs motivates the choices 
<Tr(L z )=a r oe«"°- R ° ) ' R * 



(29) 



where q = 0.45 and a r o and cr z o are approximately equal to 
the radial and vertical velocity dispersions at the Sun. For 
discussion of the value of q and these choices of functional 
form, see B10. 

The first distribution function we consider is a sin- 
gle thin quasi-isothermal disc, which was constructed with 
Rd — 3 kpc, Lo = 10 kpc km s -1 and a r o = <J z o = lOkms -1 . 
This is rather dynamically cold, but we use it as a simple 
example to demonstrate and test the basic principles of the 
apparatus. 

We have also tested the apparatus on a more realistic 
model Galaxy that has both thin and thick discs, with the 
thick disc contributing 23 per cent of the total disc surface 
density at the Sun. Table 1 gives the values of the parameters 
in the DF of this model. 

B10 showed that by superposing a large number of dfs 
of a very similar form to these quasi-isothermal dfs, one can 
obtain a model that is consistent with the local stellar den- 
sity and velocity distribution. In this study we, like Binney 
& McMillan (2011), restrict ourselves to these simple one- 
or two-disc models in order to provide some straightforward 
demonstrations of the principles discussed in Section 4. Ex- 
tending this work to the more complicated dfs used by B10 
is in principle straightforward. 



6.4 Constructing a pseudo-catalogue 

To construct a pseudo-catalogue of stars, we first randomly 
sample the distribution function in action space. If we take 
care to choose / so it can be evaluated without actually 
constructing a torus (which requires that it does not con- 
tain exact orbital frequencies) very little work is involved 
in evaluating /(J) so MCMC sampling, in which at best a 
third of the points investigated are actually selected, is not 
expensive. For each chosen J a torus is constructed. 

Our tests are performed with a pseudo-catalogue that 
is magnitude limited with maximum apparent magnitude 
m — 17, and we confine the pseudo-catalogue to b > 30° be- 
cause we have neglected extinction. The shape of the survey 
volume is then comparable to that of recent surveys (e.g. 
SDSS, RAVE). For each computed torus we evaluate the se- 
lection function </>(J) as described in Section 4.1. For some 
of our selected values of J, the selection function will vanish 
because the torus does not intersect the survey volume. Such 
tori may be excluded from the list and it is in fact conve- 
nient to identify many of these tori in advance of computing 
them. Specifically, for a reasonable number of points in the 
J r , J4, plane we find the minimum value of J z that takes a 
torus with given J r and J$ into the survey volume - this is 
conveniently done using the adiabatic approximation (Bin- 
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Figure 2. Tests of our ability to recover the parameters of the DF a catalogue of 5 000 stars. Each panel shows the pdf of the DF in 
the ((T r Oi cr zo) plane. The data from which these pdfs were recovered included photometry and proper motions in all three cases. The 
extreme left panel gives the pdf when no additional data arc available, the centre panel shows what is achieved by adding parallaxes, 
and the rightmost panel shows the pdf when both parallaxes and line-of-sight velocities are available. Contour levels are at n/9 of the 
peak probability density, where n = 1, . . . , 8. The cross marks the true value of the parameters. 



Table 2. Means and dispersions in kms -1 of parameters of the DF recovered from data made with just a thin disc. The first three 
columns give results for a catalogue with 5 000 stars in a catalogue with Gaia-like measurement errors. The last two columns give the 
uncertainties arising from finite catalogue size with 5 000 and 10 000 stars in an error-free catalogue. 

Gaia-like errors perfect data 

fi only (i, zu fi, taj, vi os 5 000 stars 10 000 stars 

ovo 10.06 ±0.14 10.06 ±0.09 10.044 ± 0.075 10.02 ± 0.07 10.06 ± 0.05 
rr z0 10.08 ±0.10 10.09 ± 0.06 10.03 ± 0.06 9.97 ± 0.05 9.97 ± 0.04 



ney & McMillan 2011; Schonrich & Binney 2011). Then by 
interpolating between these values we can assess whether a 
given point J is well outside the region of action space that 
can contribute to the catalogue. 

To add a star to the catalogue, we select a torus from 
our list at random. Then we choose a value for each 9i at 
random between and 27r, and an absolute magnitude M 
randomly from our luminosity function (eq. 26) . This defines 
the position, velocity, and apparent magnitude of a star. Ob- 
servational errors are then added to the observational data 
for each star. The assumed errors were 0.2 mas in parallax, 
0.2 mas yr -1 in proper motion, and 5kms _1 in line-of-sight 
velocity. The star enters the catalogue if its observational 
data place it within our survey limits, and is discarded oth- 
erwise. We repeat this process until we have produced a 
catalogue of the desired size. 

6.5 Case of only a thin disc 

Fig. 2 shows the pdfs of the parameters ovo and <r z o for 
the pure thin-disc model when there are data for 5 000 stars 
and only photometry and proper motions are available (left- 
most panel), or parallaxes are also available (central panel), 
or both parallaxes and line-of-sight velocities are available 
(rightmost panel). The means and dispersions of these pdfs 
are given by the last three columns of Table 2. We see that 
in all three cases, the true parameter values lie within the 
expected error ellipse and the uncertainties of the param- 
eters are < 1.5 per cent. We obtained these results using 
40 000 tori with non- vanishing values of 0(J). 

The situation when neither parallaxes nor line-of-sight 
velocities are available is of particular interest. Given the 



breadth of the luminosity function plotted in Fig. 1, it is re- 
markable that the parameters can be determined with such 
precision because to do this, the algorithm has to correctly 
infer the typical velocities of stars, and it can deduce a veloc- 
ity from a given proper motions only by correctly inferring 
the distance. By itself the broad luminosity function does 
not provide sufficient distance information. The required dis- 
tance information is inferred by piecing together several lines 
of evidence. For example, in the given potential, a model 
with larger values of ovo and a z o would have a thicker disc, 
so stars seen at b — 30° along I = and I = 180° would typ- 
ically have Galactocentric radii that differed more than in a 
model with small <Tio- Consequently, the ratio of the stellar 
densities seen along I = and I — 180° increases with the 
Uio, and from the ratio present in the data the algorithm can 
constrain the Oio- An independent constraint is provided by 
the Sun's motion with respect to the LSR, since both the 
asymmetric drift of the disc and the typical distance to stars 
vary with the <no, so the dispersions are constrained by the 
mean proper motion of disc stars. This is essentially the 
classical concept of secular parallax (e.g. §2.2.3 of Binney & 
Merrifield 1998) in a sophisticated context. 

When only proper motions are available, the error el- 
lipse is tilted with respect to the axes in the sense of a posi- 
tive correlation between o>o and a z o- The correlation arises 
because increasing a z o without increasing ovo would, for ex- 
ample, at I ~ 90° increase /i b without increasing /i;, and this 
would be apparent in the data. The impact on the data of 
increasing both dispersions in step can be to some extent 
masked by increasing the assumed distances to stars. The 
central panel of Fig. 2 shows that adding parallaxes elim- 
inates this correlation. The panel on the extreme right of 
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Figure 3. A bias introduced by using too few tori. Here only 
2 000 tori have been used to recover the parameters of the DF of 
the single-disc model. 
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Figure 4. The pdf in the (a>o, ct z o) plane for the same catalogue 
as that used in rightmost panel of Fig. 2, when an incorrect trial 
DF (with ovo = <7 z o = 12kms _1 ) was used to perform the original 
integral over J. 



Fig. 2 shows that for this model little is gained by adding 
line-of-sight velocities with errors of 5kms _1 , presumably 
because the random motions of the stars are so small. 



6. 5. 1 Case of too few tori 

In Section 5 we noted that using too few tori is liable 
to produce biased results. This phenomenon is illustrated 
by Fig. 3, which shows the same results as the rightmost 
panel of Fig. 2 except that 2 000 rather than 40 000 tori 
have been used in the analysis. The algorithm deduces that 
oy = (9.85 ± 0.07) kms" 1 , o z0 = (10.23 ± 0.06) km s" 1 so 
the central values lie over 3a from the truth. 



6.5.2 Choice of trial distribution function 

The Monte-Carlo integrals over J in equation (17) are per- 
formed using an initial trial DF, which then allows us to 
evaluate the likelihood of similar dfs using the substitution 
given in equation (23). In the other cases reported in this 
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Figure 5. The pdf in the (oyo>Czo) plane when the only source 
of uncertainty is the finite size of the catalogue. 



study we use the DF from which the stars were drawn as this 
trial DF, for convenience. To use this method in practice we 
need to be confident that it returns a correct result when a 
different trial DF is used. 

In Fig. 4 we show the pdf in the (a r0 , a z0 ) plane found 
for the thin-disc DF using the same input catalogue as that 
used to produce the rightmost panel of Fig. 2 and the right- 
most column of Table 2, but with an initial trial DF which 
is a "quasi-isothermal" disc with a r o = cr z o = 12kms _1 . 
The algorithm finds the results a r0 = (10.07 ± 0.08) kms -1 , 
o~ z o = (10.06 ± 0.06) kms -1 , which is very close to the re- 
sult found with the correct trial DF. Using a trial DF which 
differs from the true DF does reduce the effective number of 
tori which are used to determine the true DF, as it places an 
excessive number of tori in some regions of limited interest 
in action space, leaving a reduced number of tori in the rel- 
evant regions in action space. If the DF determined by the 
algorithm differs significantly from the trial DF it is sensible 
to use the DF that is determined as a new initial trial DF, 
and repeat the analysis to ensure that the results are not 
biased by the low effective number of tori. 

6.5.3 Irreducible statistical error 

Even with error-free data it cannot be possible to constrain 
the parameters in the DF tightly if the catalogue is small. 
To determine the size of the uncertainty that is inherent in 
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Figure 6. As Fig. 2 but for the case of both thin and thick discs and using a catalogue with 10 000 stars. The top row shows the pdfs 
for the parameters of the thin disc after marginalisation over the parameters of the thick disc, while the bottom row shows those for the 
thick disc, similarly marginalised. 



using only a finite number of stars we must do two things: 
(i) consider a catalogue with error-free data, and (ii) anal- 
yse this catalogue using precisely the tori that were used to 
generate the catalogue. This second step eliminates any bias 
arising from the use of a finite number of tori to evaluate the 
integral over J by Monte-Carlo integration by the following 
argument. With error- free data, only the star's true torus 
will assign a non-zero probability to the star, so it is impor- 
tant to be sure that this torus is present. Given that it is 
present, the Monte-Carlo estimate of the integral over J will 
be exact, so the only source of uncertainty in the parameters 
of the df will be the finite size of the catalogue. 

Fig. 5 shows the results of running this test on the 
single-disc model when we have perfect measurements of 
proper motion, parallax and v\ OB for 5 000 stars (top) or 
10 000 stars (bottom). The means and dispersions of the 
parameters in each case are given in the first two columns 
of Table 2. By resampling the df to obtain different discrete 
realisations, one can directly verify that the formal errors 
shown in Table 2 are a fair indication of the uncertainty 
in the df that is inherent in the size of the stellar sample. 
The decrease in the uncertainty of both a r o and a Z Q with 
increasing number iV of stars in the catalogue is consistent 
with (T;o oc 1/y/N. With 5000 stars the irreducible statis- 
tical uncertainty is ~ 50 percent of the uncertainty with 
Gaia-like errors in the proper motions, even with no further 
information. The addition of the parallax to the data set 
reduces the uncertainty with Gaia-like errors to very little 
more than the uncertainty with perfect data. 



Table 3. Means and dispersions in kms -1 of the parameters of 
the DF recovered from data made with a Galaxy that has both 
thin and thick discs, the analysed catalogue containing 10 000 
stars 

H only /i, w n, w, v\ os 

ovo 25.7 ±0.99 27.2 ± 0.39 27.1 ±0.33 
o- z0 19.4 ±0.78 20.3 ±0.32 19.9 ± 0.26 

oy 48.8 ±0.91 48.8 ± 0.47 48.3 ± 0.34 
(t z0 43.3 ±0.53 43.7 ±0.29 43.6 ± 0.25 

6.6 Case of both thin and thick discs 

Fig. 6 and Table 3 show results for the more realistic case 
of two discs with realistic velocity dispersions. In all three 
cases the catalogue contained 10 000 stars and 100 000 tori 
were used in the analysis. 

When only proper motions are available, the correlation 
between ovo and a z o is now rather strong for both thin and 
thick discs. Adding parallaxes produces a more dramatic 
improvement in precision than in the case of the single-disc 
model. Adding line-of-sight velocities also produces a mate- 
rial improvement in accuracy, presumably because the errors 
of 5kms _1 in the data are a smaller fraction of typical ran- 
dom velocities than in the case of the pure thin-disc model. 
In this test the fraction of the stars in each disc was fixed at 
its true value. If this fraction is allowed to vary, the recovered 
parameters have larger uncertainties because a trade-off is 
possible between the velocity dispersion of the thicker com- 
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ponent and the fraction of the stars it carries: the lower the 
dispersion, the more stars it can carry. In the real world this 
degeneracy must be broken by using chemical information: 
the thick disc is fundamentally the component with high 
[a/Fe]. 



7 DISCUSSION 

In this paper we have laid out the formalism of an ap- 
proach to the interpretation of data from surveys, and have 
shown that it works in principle. However, we have only 
scratched the surface of the overall problem. Within the ex- 
isting framework, we need to: 

• Chart the growth in the uncertainties of the parameters 
as the measurement errors of the observations increase. 

• Determine the precision to which the Galactic potential 
can be determined by fitting the potential in parallel with 
the df, rather than considering it known. 

The only barrier to investigating these questions is com- 
putational cost: if we change either the measurement errors 
or the gravitational potential, we have to re-evaluate the 
likelihood of the catalogue from scratch rather than by re- 
using the saved contributions of each torus to each star. 
The second of the questions above is of particular interest 
because upon its answer depends the prospects for pinning 
down the Galaxy's dark-matter density to good precision. 

The computational cost of evaluating the likelihood of 
a catalogue is dominated by the integrals of sight-lines in 
equation (21). Since there is no dependence between these 
integrals for different stars, the computation is parallelisable, 
and we do employ multiple cores. 

Two other directions requiring urgent exploration are: 

• Extend the formalism to models that possess more than 
one stellar population, each population being characterised 
by its own df and distribution in the colour-luminosity 
plane. 

• Extend the formalism to include additional observables, 
such as colour indices, measures of [Fe/H], [a/Fe], etc. 

The first of these directions is conceptually straightfor- 
ward and not especially computationally costly. It is manda- 
tory in the sense that it would introduce correlations be- 
tween kinematics, chemistry and the luminosity function 
that were first identified in the Galaxy over half a century 
ago. Operationally it is easy: one simply takes the df to be a 
superposition of dfs with a different colour-luminosity dis- 
tribution for each component df. The presence of more than 
one population in the system should sharpen our ability to 
constrain the gravitational potential from a given catalogue 
as it does in the simpler context of dwarf spheroidal galaxies 
(Amorisco & Evans 2011). 

One could extend the range of observables considered 
in a trivial way, but at some point it becomes sensible to 
widen the scope of the scheme to include both a model of 
the chemical evolution of the Galaxy and the strong con- 
straints on stellar parameters yielded by the theory stellar 
evolution. We hope to publish details of such a formalism 
shortly. The scope of the exercise is then broadened to diag- 
nosing the history of the Galaxy as well as its present state, 



and the resulting constraints on df and gravitational poten- 
tial would reflect the information regarding stellar distances 
that is normally embodied in "photometric distances" . 
An issue that must at some point be addressed is: 

• The impact of using an inappropriate form of the df. 

The df of the Galaxy certainly does not consist of a 
superposition of the quasi-isothermal dfs we have employed 
here, and the question arises as so how accurately the real 
df can be represented by our functional form (or some other 
tractable form). That is, once we have found the best-fitting 
df of our form, we want to answer the question "is our cat- 
alogue statistically consistent with being drawn from the 
chosen df." At an elementary level this question is read- 
ily addressed by drawing pseudo-catalogues from the model 
df and for each such catalogue calculating L from equa- 
tion (18) - if the value of L obtained from the real cata- 
logue lies within the range of those obtained for the pseudo- 
catalogues, then the model df is clearly consistent with the 
data. In reality this exercise will always reveal that L for the 
real catalogue lies outside the range defined by the pseudo- 
catalogues, because the Galaxy is an extremely complicated 
system, full of structure that is not included in the model 
DF. Before we can pass judgement on our fitted df we have 
to understand what it fails to represent: are there large-scale 
discrepancies between the predictions of the df and the ob- 
servations, or does it merely fail to include small-scale fluc- 
tuations that one is not endeavouring to represent at this 
level? Given the limitations on binning the real catalogue 
that were described in Section 4.2, addressing this question 
will be hard. 

An issue in this area that can be more easily addressed 
is "what impact the use of an inappropriate form of the df 
will have on the inferred gravitational potential and dark- 
matter density." We have to expect that an inappropriate 
form of the df will distort our perception of the gravitational 
potential because when the df is wrong, we cannot expect 
the likelihood of the data to be maximised by the true poten- 
tial. Clearly an erroneous potential will yield an erroneous 
dark-matter distribution. This question could be rigorously 
addressed by drawing catalogues from N-body simulations 
and comparing the potential and dark-matter densities in- 
ferred from the catalogue with the real ones. 

A final topic that merits discussion is 

• Optimising the design of surveys 

Large resources are currently being invested in surveying 
the Galaxy, and in principle it would be useful before com- 
mitting these resources to determine the most cost-effective 
strategy by asking questions such as "is it more useful to 
measure TV line-of-sight velocities with errors of 5kms _1 or 
N/2 velocities with errors of 3kms _1 ? "Would it be more 
useful to increase by half a magnitude the magnitude limit 
of a satellite's parallax measurements, or to measure line- 
of-sight velocities for a tenth of the stars? The methods de- 
scribed in this paper make it possible to compute precise 
answers to such questions. 

In this context an obvious step to take now is to inves- 
tigate the degeneracies between the various parameters of 
Galaxy models, and ask which combination of data types 
is most effective at breaking such degeneracies. In this pa- 
per we varied only the velocity scales of the df. Ideally we 
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would also vary the other parameters in the df, such as the 
scale lengths _Rd and the relative normalisations of the thin 
and thick discs, and also the parameters of the gravitational 
potential, the Sun's velocity, etc. The extent to which there 
are degeneracies between these parameters, will depend on 
the richness of the data, and it behoves us to understand 
the relative power of different data sets before we design a 
survey. 



8 CONCLUSIONS 

It seems inevitable that we will extract the promised science 
from current and upcoming surveys of the Galaxy by com- 
paring the surveys' catalogues with models that have been 
"observed" with biases matched to those of the surveys. A 
major goal of the surveys is to map the Galaxy's dark-matter 
distribution, which is possible only in so far as the Galaxy 
can be presumed to be in a steady state. Consequently equi- 
librium dynamical models are of particular importance. 

The number of variables that are commonly measured 
for each star is large - in addition to six phase-space coordi- 
nates and the apparent magnitude, one or more colours, the 
effective temperature, the surface gravity and two measures 
of metallicity are routinely measured because our ambitions 
to unravel the star-formation and metal-enrichment histo- 
ries of the Galaxy turn on the availability of such data for 
millions of stars. In Section 4.2 we saw that even if such data 
were available for a billion stars, only the crudest estimate 
of the density of stars in the high-dimensional data space 
could be obtained by binning the data. Therefore it seems 
inevitable that models will have to be fitted to the data by 
maximising the likelihood of the data given a model. 

These likelihoods can be calculated only if we have the 
pdf of the model in data space. This requirement is a major 
difficulty for N-body models, for which it is in principle no 
easier to determine the pdf than it is for the Galaxy. 

For these reasons we believe models based on analytic 
dfs and orbital tori have unique potential for the scientific 
exploitation of large surveys of the Galaxy. In Section 6.4 we 
described how to produce a discrete realisation from a torus 
model, and we have made extensive use of such realisations 
in our tests. 

In Section 4 we developed the formalism required to 
determine likelihoods for a torus model under some simpli- 
fications. The principal simplification was the neglect of all 
variables other than sky position, space velocities and ap- 
parent magnitude. Since in this framework there is no pos- 
sibility of distinguishing different types of stars, for example 
metal-poor ones, or a-enhanced ones or constrain the age of 
a star, this level of analysis is only appropriate for a Galaxy 
that consists of a single stellar population. Therefore, this 
exercise is an artificial one, but nonetheless a vital one from 
which we can build towards work of greater complexity and 
realism. 

The key equation (18) expresses the log- likelihood as a 
sum of integrals down the line-of-sight through individual 
tori. Since a large number of tori must be used if unbiased 
results are to be obtained, the computational cost of these 
integrals is large. Fortunately, once they have been evalu- 
ated for a given catalogue and gravitational potential, the 
likelihood of the data given any df can be quickly computed. 



Therefore the optimisation of the df in a given potential is 
st r aight forward . 

In Section 6 we tested the algorithm by using it to re- 
construct the df from catalogues containing either 5 000 or 
10 000 stars with Gaia-like errors. We found that it per- 
formed as expected. The irreducible statistical uncertainty 
in the df scales roughly inversely with the square root of 
the number of stars in the catalogue, and for 5 000 stars 
amounts to ~ 0.5 per cent on the velocity dispersions, which 
is ~ 50 per cent of the uncertainty on the df found with only 
measurements of the proper motion (with Gaia-like uncer- 
tainties). By adding Gaia-like measurements of parallax for 
5 000 stars to those of proper motion, one can drive the un- 
certainty in the df almost down to the irreducible statistical 
uncertainty. 

The uncertainties quoted above are remarkably small 
given the small sizes of the catalogues analysed. The uncer- 
tainties will grow, probably significantly, when the potential 
is varied alongside the df. They will also grow with the 
sophistication of the df, although one may hope that this 
growth will be to a large extent countered by enrichment of 
the data to include spectrophotometric data that must ac- 
company any attempt to decompose the Galaxy into several 
populations with distinct dfs. 

The uncertainties on parameters that we recover include 
the effects of measurement errors and statistical uncertainty 
arising from the finite number of stars in the analysed cata- 
logue, but do not make allowance for the use of only a finite 
number of tori. An insufficient supply of tori will lead to 
biases in the results. The entropy of the probability distri- 
bution defined by equation (24) being small is a sure sign 
that too few tori are being used, but the surest check that 
enough tori are being used is to draw a fresh sample of tori 
from the final df and to re-determine the pdf of the df's pa- 
rameters using these tori. If the pdf differs materially from 
the original one, more tori are required. 

It is remarkable that the df can be recovered from 
proper motions alone because with the broad luminosity 
function used here the data by themselves contain negli- 
gible distance information. The distance information that is 
required to pin down the df and thus establish the typical 
velocities of stars is provided by the gravitational potential, 
which sets a scale-height at any velocity dispersion, and by 
the solar motion through the logic of traditional secular par- 
allaxes. Since only limited distance information is available 
when the data are restricted to proper motions, the two ve- 
locity scales of the df, a r o and a z o have correlated errors. 

Adding parallaxes to the data set eliminates this corre- 
lation as well as diminishing the scale of the uncertainties. 
Further adding line-of-sight velocities with uncertainties of 
5kms~ has at most a small impact on the results. 

In Section 7 we discussed several directions for further 
work. One urgent step is to extend the formalism to in- 
clude the spectral characteristics of stars, such as metallic- 
ity, colour, effective temperature and surface gravity. The 
main issue here is how best to extend our models to predict 
their distributions. There is more than one way that this 
can be done, so we reserve this topic for a future paper. A 
related issue is how best to combine constraints from dif- 
ferent surveys, for example an astrometric survey such as 
Gaia with a spectroscopic one such as the ESO-Gaia survey. 
One option is to analyse the catalogues one after the other, 
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using constraints obtained from the first analysis to define 
the prior used in the second analysis. Alternatively it may 
be possible to refine the definition of the selection function 
4>(J) in such a way that catalogues with different selection 
criteria can be analysed simultaneously. 

In this paper we have neglected extinction. Given a 
three-dimensional model of the interstellar medium, it would 
be straightforward if computationally costly, to allow for 
reddening and extinction. Ideally, the model of the ISM 
would be refined in parallel with the Galaxy model, but it is 
not yet apparent how this could be accomplished in practice. 

A key topic is the extent to which the Galaxy's grav- 
itational potential can be constrained by various bodies of 
data. In principle this is a trivial extension of the present 
work, but it is computationally expensive. 

Computational cost is a real issue. The cost of analysing 
a catalogue is proportional to the number of its stars times 
the mean number of tori n„ that might make a non-negligible 
contribution to each star's probability. Although the total 
number of tori needed increases with the precision of the 
data, n, should be roughly constant between catalogues, so 
computational cost should scale with the number of stars 
in the catalogue. However, this situation may be improved 
by binning stars by position on the sky, and evaluating the 
appropriate line-of-sight integral (eq. 21) simultaneously for 
all the stars in a bin for a given torus, under the approxima- 
tion that their position on the sky is that of the centre of the 
bin. Approaches like this which reduce computational effort 
are likely to prove essential in scaling up this algorithm from 
working with the pseudo-catalogues of 10 000 stars used here 
to analysing surveys such as RAVE, with < 500 000 stars 
observed and, looking further ahead, the ~10 9 stars in the 
Gaia catalogue. 
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APPENDIX: PROOF THAT L IS STATIONARY 
FOR THE TRUE DF 

By the Monte-Carlo integration theorem, the sum over a in 
the definition (18) of the likelihood L can be replaced by 
an integral over u times the pdf of the stars in that space, 
which is Pr (u|/ t ), where ft is the true DF. Hence 



= J d 7 uPr (u|/ t )ln[Pr (u|/)], 
~' h) Pro(u|/) 



f a =/dWu|/^ Pr ° (u|/)/aa 



(30) 



(31) 
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When f = ft the denominator cancels with the pdf on top 
and we find 



8L 
da 



= f d 7 u dPr (u\f t ) 



(32) 



This vanishes because it is the derivative of 
Jd 7 uPr (uj/ t ) = l. 



